Loan Data Exploration by Allan Visochek ## Investigate The following variables Borrower Attributes: >CreditGrade >CreditScoreRangeLower >EmploymentStatus >IsBorrowerHomeOwner >LoanMonthsSinceOrigination >StatedMonthlyIncome >IncomeRange >BorrowerState >Occupation >DebtToIncomeRatio Loan Attributes: >Term >BorrowerRate >LoanOriginalAmount ========================================================
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.1340 0.1840 0.1928 0.2500 0.4975
## 10%
## 0.09886
## 90%
## 0.3099
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1000 4000 6500 8337 12000 35000
## Employed Full-time Not available Not employed
## 2255 67322 26355 5347 835
## Other Part-time Retired Self-employed
## 3806 1088 795 6134
## False True
## 56459 57478
## AK AL AR AZ CA CO CT DC DE FL GA
## 5515 200 1679 855 1901 14717 2210 1627 382 300 6720 5008
## HI IA ID IL IN KS KY LA MA MD ME MI
## 409 186 599 5921 2078 1062 983 954 2242 2821 101 3593
## MN MO MS MT NC ND NE NH NJ NM NV NY
## 2318 2615 787 330 3084 52 674 551 3097 472 1090 6729
## OH OK OR PA RI SC SD TN TX UT VA VT
## 4197 971 1817 2972 435 1122 189 1737 6842 877 3278 207
## WA WI WV WY
## 3048 1842 391 150
## Accountant/CPA
## 3588 3233
## Administrative Assistant Analyst
## 3688 3602
## Architect Attorney
## 213 1046
## Biologist Bus Driver
## 125 316
## Car Dealer Chemist
## 180 145
## Civil Service Clergy
## 1457 196
## Clerical Computer Programmer
## 3164 4478
## Construction Dentist
## 1790 68
## Doctor Engineer - Chemical
## 494 225
## Engineer - Electrical Engineer - Mechanical
## 1125 1406
## Executive Fireman
## 4311 422
## Flight Attendant Food Service
## 123 1123
## Food Service Management Homemaker
## 1239 120
## Investor Judge
## 214 22
## Laborer Landscaping
## 1595 236
## Medical Technician Military Enlisted
## 1117 1272
## Military Officer Nurse (LPN)
## 346 492
## Nurse (RN) Nurse's Aide
## 2489 491
## Other Pharmacist
## 28617 257
## Pilot - Private/Commercial Police Officer/Correction Officer
## 199 1578
## Postal Service Principal
## 627 312
## Professional Professor
## 13628 557
## Psychologist Realtor
## 145 543
## Religious Retail Management
## 124 2602
## Sales - Commission Sales - Retail
## 3446 2797
## Scientist Skilled Labor
## 372 2746
## Social Worker Student - College Freshman
## 741 41
## Student - College Graduate Student Student - College Junior
## 245 112
## Student - College Senior Student - College Sophomore
## 188 69
## Student - Community College Student - Technical School
## 28 16
## Teacher Teacher's Aide
## 3759 276
## Tradesman - Carpenter Tradesman - Electrician
## 120 477
## Tradesman - Mechanic Tradesman - Plumber
## 951 102
## Truck Driver Waiter/Waitress
## 1675 436
## A AA B C D E HR
## 29084 14551 5372 15581 18345 14274 9795 6935
## A AA B C D E HR NC
## 84984 3315 3509 4389 5649 5153 3289 3508 141
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.0 660.0 680.0 685.6 720.0 880.0 591
## $0 $100,000+ $1-24,999 $25,000-49,999 $50,000-74,999
## 621 17337 7274 32192 31050
## $75,000-99,999 Not displayed Not employed
## 16916 7741 806
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0 3200 4667 5608 6825 1750000
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.000 0.140 0.220 0.276 0.320 10.010 8554
There are 113,937 loans in the dataset with 81 features, 13 of which were used in the analysis:
BorrowerRate CreditScoreRangeLower DebtToIncomeRatio LoanOriginalAmount StatedMonthlyIncome
60,36,12 >(note that term was a numerical variable but was transformed to a factor variable because it has very few values)
AA,A,B,C,D,E,HR,NC,none
AA,A,B,C,D,E,HR,none
$75,000-99,999 ; $50,000-74,999 ; $1-49,999 1-24,999 ; $0 ; Not employed; Not displayed
Employed, Full-time, Not employed, Part-time, Retired, Self-employed none,Not available, Other,
Accountant/CPA, Administrative Assistant, Analyst, Architect, Attorney, Biologist, Bus Driver, Car Dealer, Chemist, Civil Service, Clergy, Computer Programmer, Construction, Dentist, Doctor, Engineer - Chemical, Engineer - Electrical, Engineer - Mechanical, Executive, Fireman, Flight Attendant, Food Service, Food Service Management, Homemaker, Investor, Judge, Laborer, Landscaping, Medical Technician, Military Enlisted, Military Officer, Nurse (LPN), Nurse (RN), Nurse’s Aide, Other, Pharmacist, Pilot - Private/Commercial, Police Officer/Correction Officer, Postal Service, Principal, Professional, Professor, Psychologist, Realtor, Religious, Retail Management, Sales - Commission, Sales - Retail, Scientist, Skilled Labor, Social Worker, Student - College Freshman, Student - College Graduate Student, Student - College Junior, Student - College Senior, Student - College Sophomore, Student - Community College, Student - Technical School, Teacher, Teacher’s Aide, Tradesman - Carpenter, Tradesman - Electrician, Tradesman - Mechanic, Tradesman - Plumber, Truck Driver, Waiter/Waitress
AK, AL, AR, AZ, CA, CO, CT, DC, DE, FL, GA, HI, IA, ID, IL, IN, KS, KY, LA, MA, MD, ME, MI, MN, MO, MS, MT, NC, ND, NE, NH, NJ, NM, NV, NY, OH, OK, OR, PA, RI, SC, SD, TN, TX, UT, VA, VT, WA, WI, WV, WY
The majority of loans go to individuals who are employed full time.
Few loans are given out to individuals with low income, or who are unemployed.
Most Borrowers do not have a credit grade.
Loans amounts range from 0 to $35,000.
75% of loans are for under $12,000.
Nearly all of the loans given out are 100% funded.
Most loans have 0 net principal loss.
Most borrowers have 0 delinquincies in the past 7 years.
Few loans are given out from Q4 2008 through Q2 2009.
The main features of interest are LoanOriginationQuarter and BorrowerRate.
It is hard to say at this point. All variables mentioned above were selected for the investigation because they are likely to have an impact on the borrower rate.
HasCreditGrade <- CreditGrade is available
HasProsperRating <- ProsperRating is available
HasIncome >> IncomeRange is available and greater than 0
DebtToIncomeRatio
StatedMonthlyIncome
ProsperScore..Alpha,
CreditGrade,
Incomerange,
LoanOrginationQuarter
## Term: 12
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0400 0.0929 0.1434 0.1501 0.2064 0.2669
## --------------------------------------------------------
## Term: 36
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.1274 0.1815 0.1935 0.2599 0.4975
## --------------------------------------------------------
## Term: 60
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0669 0.1490 0.1870 0.1930 0.2319 0.3304
The average Borrower Rate was decreasing significantly from 2012 to 2014
The average Loan amount went down steeply in Q4 2008 and rose steadily until 2014.
It doesn’t look like there is a significant relationship from this graph.
It looks like there is a trend, although it is still not quite clear.
Debt level quantile and median BorrowerRate are correlated.
Borrowers without a credit grade have borrower rate that is average relative to the others…
Borrower rates are cleanly distributed among the different prosper ratings. (in contrast to the credit grades which overlap…)
Borrowers without a prosper rating have a median borrower rate that is average relative to the others…
Credit score varies much less among different prosper ratings than it does with credit grades…
In General, Higher Paying Occupations with a higher level of education (i.e. Judge, Computer Programmer Engineer) have a lower median borrower rate than lower paying occupations with lower levels of education (i.e. Teacher’s Aide, Nurse’s Aide, College Freshman).
The average Borrower Rate was decreasing significantly from 2012 to 2014
The average Borrower Rate was decreasing significantly from 2012 to 2014
Debt level quantile and median BorrowerRate are correlated
Borrowers without a credit grade have borrower rate that is average relative to the others…
Borrower rates are cleanly distributed among the different prosper ratings. (in contrast to the credit grades which overlap…)
Borrowers without a prosper rating have a median borrower rate that is average relative to the others…
The borrower rate goes down as income range goes up, the exception being the $0 category.
There is a clean and linear relationship between the StatedMonthlyIncome Quantile and the Median BorrowerRate.
In General, Higher Paying Occupations with a higher level of education (i.e. Judge, Computer Programmer Engineer) have a lower median borrower rate than lower paying occupations with lower levels of education (i.e. Teacher’s Aide, Nurse’s Aide, College Freshman).
Credit score varies much less among different prosper ratings than it does with credit grades…
Most borrowers who have an IncomeRange Recorded have a StatedMonthlyIncome that falls within this range, however there are quite a few exceptions.
The strongest Relationship was between the Borrower Rate and Credit Grade and Borrower Rate and Prosper Rating.
The distribution of borrower rates is less even from 2009 through 2012.
This may have to do with the relatively low number of borrowers during that time period.
The average loan amount is closely correlated with the number of loans given out.
The inverse of the BorrowerRate seems to follow the trends in the number of loans and LoanOriginalAmount by about 2 years.
ok, now I get it… The company used credit grade to determine the borrower rate up until 2009 and then started using their own metric namely, the Prosper Rating
lets make a new variable to separate the the ealier loans from the latter.
## Warning: Removed 569 rows containing missing values (geom_point).
## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
## Warning: Removed 315 rows containing missing values (geom_point).
## Warning: Removed 254 rows containing missing values (geom_point).
## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
## Warning: Removed 315 rows containing missing values (stat_summary).
## Warning: Removed 254 rows containing missing values (stat_summary).
## Warning: Removed 315 rows containing missing values (geom_point).
## Warning: Removed 254 rows containing missing values (geom_point).
Credit score is correlated to Borrower Rate, although that correlation changes by quarter.
As of 2009, there is a strict cutoff at a credit score of 600.
It looks like there isn’t much variation in Borrower Rate by credit Score for those borrowers with a credit score of less than 600.
I will define bad credit as having a credit score of less than 600 and treat these loans as a separate category.
There is at least some relationship between LoanOriginalAmount and BorrowerRate.
All loans in the earlier period are 3 year loans.
The correlation between the borrower rate and the loan amount becomes higher as the CreditGrade goes up…
It looks like loan original amount was used to determine the BorrowerRate up until 2011.
Lets look at this in a bit more detail…
The correlation between loan amount and borrower rate also goes up as prosper rating goes up.
1 year loans were only offered as of Q4 2010, so the loan original amount clearly does not effect the borrower rate for any such loans.
the same goes for the 5 year loans…
The median borrower rate is related to IncomeRange.
The trend towards lower BorrowerRate for higher StatedMonthlyIncome is consistent accross all of the quarters.
The trend towards higher BorrowerRate for Higher DebtToIncomeRatio is consistent accross all of the quarters.
The quantile Of Available Monthly income is correlated to the median Borrower Rate.
#### Lets avoid this for now…
The average loan amount is closely correlated with the number of loans given out.
The inverse of the BorrowerRate seems to follow the trends in the number of loans and LoanOriginalAmount by about 2 years.
Credit score is correlated to Borrower Rate, although that correlation changes by quarter.
As of 2009, there is a strict cutoff at a credit score of 600.
It looks like there isn’t much variation in Borrower Rate by credit Score for those borrowers with a credit score of less than 600.
The correlation between the borrower rate and the loan amount becomes higher as the CreditGrade goes up…
The correlation between the borrower rate and the loan amount becomes higher as the Prosper Rating goes up…
The correlation between loan amount and borrower rate exists until 2011
The median borrower rate is related to IncomeRange.
The trend towards lower BorrowerRate for higher StatedMonthlyIncome is consistent accross all of the quarters.
The quantile Of Available Monthly income is correlated to the median Borrower Rate. ### Were there any interesting or surprising interactions between features?
As obserd in the Credi
All loans in the earlier period are 3 year loans.
1 and 5 year loans are only offered as of 2011.
Yes… get to this later…
## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
This prosper loan data is a rich and sophisticated dataset with multiple inerrelated variables. While I was able to identify trends in the borrower rate